Given a set of strings, the shortest common superstring problem is to findthe shortest possible string that contains all the input strings. The problemis NP-hard, but a lot of work has gone into designing approximation algorithmsfor solving the problem. We present the first time and space efficientimplementation of the classic greedy heuristic which merges strings indecreasing order of overlap length. Our implementation works in $O(n \log\sigma)$ time and bits of space, where $n$ is the total length of the inputstrings in characters, and $\sigma$ is the size of the alphabet. After indexconstruction, a practical implementation of our algorithm uses roughly $5 n\log \sigma$ bits of space and reasonable time for a real dataset that consistsof DNA fragments.
展开▼
机译:给定一组字符串,最短的常见超字符串问题是找到包含所有输入字符串的最短字符串。该问题是NP难题,但是为解决该问题设计了近似算法已经做了很多工作。我们提出了经典贪婪启发式的第一次在时间和空间上的实现,该实现合并了以重叠长度递减的顺序排列的字符串。我们的实现在$ O(n \ log \ sigma)$时间和空间位中工作,其中$ n $是输入字符串的总长度(以字符为单位),而$ \ sigma $是字母的大小。在索引构造之后,我们算法的实际实现对包含DNA片段的真实数据集使用大约5 n \ log \ sigma $位空间和合理的时间。
展开▼